Stereo-disparity and View Generalization 1 Stereo Disparity Facilitates View Generalisation during Shape Recognition for Solid Multi-part Objects
نویسندگان
چکیده
Current theories of object recognition in human vision make different predictions about whether the recognition of complex, multi-part, objects should be influenced by shape information about surface depth orientation and curvature derived from stereo disparity. We examined this issue in five experiments using a recognition memory paradigm in which observers (N=134) memorised and then discriminated sets of 3D novel objects at trained and untrained viewpoints under either monoor stereo viewing conditions. In order to explore the conditions under which stereo-defined shape information contributes to object recognition we systematically varied the difficulty of view generalisation by increasing the angular disparity between trained and untrained views. In one series of experiments objects were presented from either previously trained views or untrained views rotated (15 ̊, 30 ̊ or 60 ̊) along the same plane. In separate experiments we examined whether view generalisation effects interacted with the vertical or horizontal plane of object rotation across 40 ̊ viewpoint changes. The results showed robust viewpoint dependent performance costs: observers were more efficient in recognizing learned objects from trained relative to untrained views, and recognition was worse for extrapolated relative to interpolated untrained views. We also found that performance was enhanced by stereo viewing but only at larger angular disparities between trained and untrained views. These findings show that object recognition is not based solely on 2D image information but that it can be facilitated by shape information derived from stereo disparity. Word count: 234 words STEREO-DISPARITY AND VIEW GENERALIZATION 3 The human visual system is remarkably adept at recognising three-dimensional (3D) objects across changes in viewpoint, although there is considerable evidence that recognition is not entirely viewpoint invariant (e.g., Arguin & Leek, 2003; Bülthoff & Edelman, 1992; Harris, Dux, Benito, & Leek, 2008; Hayward, 2003; Leek, Atherton, & Thierry, 2007; Tarr & Pinker, 1990). Other work has shown that viewpoint dependency in recognition is influenced by a variety of factors including, for example, shape discriminability (e.g., Hayward & Williams, 2000), complexity (e.g., Bethell-Fox & Shepard, 1988), object geometry (e.g., Leek & Johnston, 2006; Tarr & Pinker, 1990), familiarity (e.g., Leek, 1998a & 1998b) and the level of image classification required by the task (e.g., Hamm & McMullen, 1998). Of interest in the current study is the potential contribution to view generalisation of shape information derived from stereo disparity. Stereo disparity potentially provides useful information about shape that may facilitate view generalisation and recognition, including for example, surface orientation, surface curvature gradients and polarity, as well as 3D aspect ratio (Norman, Todd, & Phillips, 1995; Norman, Swindle, Jennings, Mullins, & Beers, 2009; Welchman, Deubelius, Conrad, Bülthoff, & Kourtzi, 2005; Wismeijer, Erkelens, Ee, & Wexler, 2010). However, it has been proposed that stereo disparity plays little, if any, role in the perception and recognition of object shape (Li, Pizlo, & Steinman, 2009; Li & Pizlo, 2011; Pizlo, 2010). For example, Pizlo and colleagues have suggested that 3D shape can be recovered from mono-ocular two-dimensional (2D) perceptual input alone when image reconstruction follows geometric simplicity constraints (e.g. Li, Pizlo, & Steinman, 2009; Li & Pizlo, 2011; Pizlo, 2010). Similarly, current appearanceor image-based models of recognition have proposed that view generalisation is accomplished via interpolation among viewpoint-specific templates that comprise solely 2D image features (e.g. Bülthoff & Edelman, 1992; Reisenhuber & Poggio, 1999; Serre, Oliva and Poggio, 2007). In contrast, a key assumption of structural STEREO-DISPARITY AND VIEW GENERALIZATION 4 description models is that recognition involves the segmentation of complex objects into constituent 3D parts or shape primitives (e.g., Biederman 1987; Hummel & Stankiewicz, 1996; Marr & Nishihara, 1978; Leek, Reppa & Arguin, 2005; Leek, Reppa, Rodriguez & Arguin, 2009). Part segmentation has been assumed to be supported by the recovery of local surface depth orientation and concave curvature minima which arise at part boundaries (Biederman, 1987; Hoffman & Richards, 1984; Marr & Nishihara, 1978) – a hypothesis that is supported by a growing body of evidence highlighting the special functional status of concave curvature minima in shape perception (e.g., Cohen, Barenholtz, Singh & Feldman, 2005; Davitt, Cristino, Wong & Leek, 2014; Hoffman & Richards, 1984; Leek, Cristino, Conlan, Patterson, Rodriguez & Johnston, 2012; Lim & Leek., 2012). Thus, these contrasting theoretical models of shape representation make different predictions about the use of certain kinds of shape information from stereo disparity. Notably, structural description models, unlike 2D image-based models, predict that the recovery of information about local surface depth orientation and concave curvature minima from stereo input should facilitate the derivation of a 3D object model to support view generalisation during recognition. Current studies investigating stereo effects on view generalisation and recognition provide an incomplete picture (Bennett & Vuong, 2006; Burke, Taubert, & Higman, 2007; Burke, 2005; Chan, Stevenson, Li, & Pizlo, 2006; Edelman & Bülthoff, 1990; Liu, Ward, & Young, 2006; Humphrey & Khan, 1992; Lee & Saunders, 2011; Pasqualotto & Hayward, 2009; Rock & DiVita, 1987). For example, Burke (2005) presented stereo images of deformed 1 We do not claim that stereo disparity provides the only source of shape information about local surface orientation and curvature. Indeed, cues such as shadow and shading gradients, texture and non-accidental properties of edge features that are computable from mono input could also be used to infer surface properties. However, surface depth orientation and part boundaries at surface intersections are enhanced under conditions of stereo viewing due to binocular disparity. STEREO-DISPARITY AND VIEW GENERALIZATION 5 paperclips in a sequential matching task and reported reduced viewpoint costs when stereo information about object structure was available (see also Bennett & Vuong, 2006; Lee & Saunders, 2011). Similar results were earlier found by Edelman & Bülthoff (1992) also using ‘bent paperclip’ and amoeba stimuli (see also Burke, 2005; Chan et al., 2006; Lee & Saunders, 2011; Rock & DeVita, 1987). However, a limitation of studies using these types of stimuli is that the results are not readily generalizable as the stimuli cannot be decomposed into parts. Thus, they do not provide the appropriate conditions to test predictions about the potential role of stereo-defined cues to part segmentation in complex 3D objects. However, there are two reports of studies contrasting monovs. stereo viewing of multi-part object stimuli. Humphrey and Khan (1992, Exp 3) contrasted effects of viewpoint change between learning and test views for novel 3D clay objects under conditions of monoand stereo-viewing (manipulated by viewing the actual 3D models in either monoor binocular viewing through a shutter). Training and test views differed by 40 or 80 degrees through rotation about a vertical axis perpendicular to the line of sight. Unfortunately, the results were equivocal. Overall they showed a clear advantage for the recognition of objects at trained views – indicating viewpoint-dependent performance. However, the effects of presentation mode (stereo, mono) were unclear. The results showed an advantage for mono-ocular viewing in response times, but the opposite pattern for accuracy. On the one hand, this could reflect a difference in the way in which shape information from stereo input is processed – where we might expect slower (but more accurate) processing due to the requirement for resolving stereo correspondence. Alternatively, as noted by Humphrey and Khan (1992, p. 186/187), it could simply reflect a strategic bias by observers to favour accuracy over speed with the stereo displays (i.e., a speed-accuracy trade-off). Unfortunately, it is not possible to distinguish between these two possibilities from the results presented by Humphreys and Khan (1992). STEREO-DISPARITY AND VIEW GENERALIZATION 6 In a more recent study, Pasqualotto and Hayward (2009) contrasted mono versus stereo viewing of 3D computer-generated models of real-world objects in a sequential matching (rather than recognition) task. They found that when observers were required to match objects across large (180 degree) viewpoint differences stereo viewing produced a performance cost (in both RTs and accuracy). The cost was attributed to a mismatch at large angular disparities between the information computed from mono and stereo-input. This finding appears to challenge the prediction of parts-based models that stereo-defined shape information should facilitate view generalisation and object recognition. However, such a conclusion would be premature. In the first place the task involved sequential matching rather than recognition, which requires indexing a long-term memory representation of object shape. In addition, the absence of a stereo advantage for view generalisation may have arisen because observers were able to efficiently match familiar features of the objects across views from 2D input alone – particularly given that the stimuli were highly common ‘real-world’ objects. Indeed, if stereodisparity can potentially facilitate recognition by making explicit 3D image features (e.g., such as part-boundaries at regions of concave surface curvature minima), we might expect to only observe stereo effects when mono-ocular cues to shape alone are insufficient, or ambiguous. That is, a stereo advantage in view generalisation is more likely when shape equivalence cannot be determined on the basis of 2D image similarity. A key factor determining 2D image similarity is viewpoint disparity: 2D image similarity for any given object will be larger for small angular differences in viewpoint. At the same time, the earlier work by Bülthoff and Edelman (1992) showed that effects of viewpoint disparity on recognition efficiency are nonlinear – in part because of familiarity: we are more familiar with some views of known objects than others. Notably, viewpoint costs in shape matching are not equivalent for horizontal and STEREO-DISPARITY AND VIEW GENERALIZATION 7 vertical axis rotations in depth. Thus, potential effects of stereo disparity on view generalisation must also take into account the axis of depth rotation. Our goal in this paper was to address these limitations of previous studies in order to examine whether shape information derived from stereo disparity facilitates view generalisation and object recognition as predicted by parts-based structural description accounts. To control prior experience and familiarity with specific views our studies used a set of CAD generated surface rendered, solid, multi-part, novel objects. The complexity and part structure of the object set was carefully controlled. In addition, we used a recognition memory task that allowed us to examine how the availability of stereo information influences performance when observers must match a perceptual description of object shape to a long-term memory representation at increasing disparities between trained and test views. This also allowed us to contrast performance between horizontal and vertical axis rotations in depth. In a series of five experiments, observers were asked to memorise a subset of four-part novel objects from either mono or stereo visual displays. In a subsequent test phase they discriminated targets (previously learned objects) from visually similar distracters at either trained or novel viewpoints in either mono or stereo. In Experiment 1, we replicated previous studies by contrasting performance between mono or stereo displays across trained and untrained interpolated viewpoints around the horizontal plane. In Experiment 2a and 2b we examined whether stereo viewing effects were modulated by task difficulty. In these two experiments, trained and interpolated viewpoints were on the same axis while extrapolated views were on the orthogonal axis. In Experiment 2a, the trained viewpoints were in the horizontal axis while in Experiment 2b the trained viewpoints were on the vertical axis. In Experiments 3a and 3b we examined the boundary conditions for stereo viewing effects by reducing the angular disparity between trained and untrained views. STEREO-DISPARITY AND VIEW GENERALIZATION 8 STEREO-DISPARITY AND VIEW GENERALIZATION 9
منابع مشابه
Optimizing Disparity Candidates Space in Dense Stereo Matching
In this paper, a new approach for optimizing disparity candidates space is proposed for the solution of dense stereo matching problem. The main objectives of this approachare the reduction of average number of disparity candidates per pixel with low computational cost and high assurance of retaining the correct answer. These can be realized due to the effective use of multiple radial windows, i...
متن کاملNew Stereo Matching and 3d View Generation Algorithms Using Aerial Stereo Images
In this paper, we propose an adaptive stereo matching algorithm to encompassing stereo matching problems in projective distortion region. Since the projective distortion region can not be estimated in terms of fixed-size block matching algorithm, we tried to use adaptive window warping method in hierarchical matching process to compensate the perspective distortions. In addition, probability th...
متن کاملStereo Video Disparity Estimation Using Multi-wavelets
Disparity estimation in stereo video processing is a crucial step in the generation of a 3D view of a scene. In this paper, a multi-wavelet based stereo correspondence matching technique for video is proposed. A multi-wavelet transform is first applied to a pair of stereo frames. Correspondence matching is initially performed at the coarsest level and relies on coarse-to-fine refinement in orde...
متن کاملSimple Object Recognition and Cognitive Map Formation Using Human-Like Vision in a Virtual World
In this paper we describe an algorithm for object recognition and cognitive map formation using stereo image data in a 3D virtual world where 3D objects and a robot with stereo imaging system are simulated. Stereo imaging system is simulated so that the actual human visual system properties are parameterized. Only the stereo images obtained from this world are supplied to the virtual robot (age...
متن کاملDepth space partitioning for omni-stereo object tracking
Using stereo disparity or depth information to detect and track moving objects is receiving increasing attention in recent years. However, this approach suffers from some difficulties, such as synchronisation between two cameras and doubling of the image-data size. Besides, traditional stereo-imaging systems have a limited field of view (FOV), which means that they need to rotate the cameras wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015